Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21.367
Filtrar
1.
Nat Commun ; 15(1): 3126, 2024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38605047

RESUMO

Long reads that cover more variants per read raise opportunities for accurate haplotype construction, whereas the genotype errors of single nucleotide polymorphisms pose great computational challenges for haplotyping tools. Here we introduce KSNP, an efficient haplotype construction tool based on the de Bruijn graph (DBG). KSNP leverages the ability of DBG in handling high-throughput erroneous reads to tackle the challenges. Compared to other notable tools in this field, KSNP achieves at least 5-fold speedup while producing comparable haplotype results. The time required for assembling human haplotypes is reduced to nearly the data-in time.


Assuntos
Algoritmos , Polimorfismo de Nucleotídeo Único , Humanos , Haplótipos/genética , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software
2.
Biochemistry (Mosc) ; 89(Suppl 1): S234-S248, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38621753

RESUMO

This review highlights operational principles, features, and modern aspects of the development of third-generation sequencing technology of biopolymers focusing on the nucleic acids analysis, namely the nanopore sequencing system. Basics of the method and technical solutions used for its realization are considered, from the first works showing the possibility of creation of these systems to the easy-to-handle procedure developed by Oxford Nanopore Technologies company. Moreover, this review focuses on applications, which were developed and realized using equipment developed by the Oxford Nanopore Technologies, including assembly of whole genomes, methagenomics, direct analysis of the presence of modified bases.


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Análise de Sequência de DNA/métodos , Biopolímeros , Sequenciamento de Nucleotídeos em Larga Escala/métodos
3.
Genome Biol ; 25(1): 91, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38589937

RESUMO

BACKGROUND: Although sequencing technologies have boosted the measurement of the genomic diversity of plant crops, it remains challenging to accurately genotype millions of genetic variants, especially structural variations, with only short reads. In recent years, many graph-based variation genotyping methods have been developed to address this issue and tested for human genomes. However, their performance in plant genomes remains largely elusive. Furthermore, pipelines integrating the advantages of current genotyping methods might be required, considering the different complexity of plant genomes. RESULTS: Here we comprehensively evaluate eight such genotypers in different scenarios in terms of variant type and size, sequencing parameters, genomic context, and complexity, as well as graph size, using both simulated and real data sets from representative plant genomes. Our evaluation reveals that there are still great challenges to applying existing methods to plants, such as excessive repeats and variants or high resource consumption. Therefore, we propose a pipeline called Ensemble Variant Genotyper (EVG) that can achieve better genotyping performance in almost all experimental scenarios and comparably higher genotyping recall and precision even using 5× reads. Furthermore, we demonstrate that EVG is more robust with an increasing number of graphed genomes, especially for insertions and deletions. CONCLUSIONS: Our study will provide new insights into the development and application of graph-based genotyping algorithms. We conclude that EVG provides an accurate, unbiased, and cost-effective way for genotyping both small and large variations and will be potentially used in population-scale genotyping for large, repetitive, and heterozygous plant genomes.


Assuntos
Algoritmos , Benchmarking , Humanos , Genótipo , Genômica/métodos , Técnicas de Genotipagem/métodos , Genoma de Planta , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
4.
Sci Rep ; 14(1): 7988, 2024 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-38580715

RESUMO

In the human genome, heterozygous sites refer to genomic positions with a different allele or nucleotide variant on the maternal and paternal chromosomes. Resolving these allelic differences by chromosomal copy, also known as phasing, is achievable on a short-read sequencer when using a library preparation method that captures long-range genomic information. TELL-Seq is a library preparation that captures long-range genomic information with the aid of molecular identifiers (barcodes). The same barcode is used to tag the reads derived from the same long DNA fragment within a range of up to 200 kilobases (kb), generating linked-reads. This strategy can be used to phase an entire genome. Here, we introduce a TELL-Seq protocol developed for targeted applications, enabling the phasing of enriched loci of varying sizes, purity levels, and heterozygosity. To validate this protocol, we phased 2-200 kb loci enriched with different methods: CRISPR/Cas9-mediated excision coupled with pulse-field electrophoresis for the longest fragments, CRISPR/Cas9-mediated protection from exonuclease digestion for mid-size fragments, and long PCR for the shortest fragments. All selected loci have known clinical relevance: BRCA1, BRCA2, MLH1, MSH2, MSH6, APC, PMS2, SCN5A-SCN10A, and PKI3CA. Collectively, the analyses show that TELL-Seq can accurately phase 2-200 kb targets using a short-read sequencer.


Assuntos
Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , DNA/genética , Genoma Humano
5.
Bioinformatics ; 40(4)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38565260

RESUMO

MOTIVATION: Automated chromatin segmentation based on ChIP-seq (chromatin immunoprecipitation followed by sequencing) data reveals insights into the epigenetic regulation of chromatin accessibility. Existing segmentation methods are constrained by simplifying modeling assumptions, which may have a negative impact on the segmentation quality. RESULTS: We introduce EpiSegMix, a novel segmentation method based on a hidden Markov model with flexible read count distribution types and state duration modeling, allowing for a more flexible modeling of both histone signals and segment lengths. In a comparison with existing tools, ChromHMM, Segway, and EpiCSeg, we show that EpiSegMix is more predictive of cell biology, such as gene expression. Its flexible framework enables it to fit an accurate probabilistic model, which has the potential to increase the biological interpretability of chromatin states. AVAILABILITY AND IMPLEMENTATION: Source code: https://gitlab.com/rahmannlab/episegmix.


Assuntos
Cromatina , Epigênese Genética , Análise de Sequência de DNA/métodos , Histonas/metabolismo , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos
6.
Genome Biol ; 25(1): 90, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38589969

RESUMO

Single-cell ATAC-seq has emerged as a powerful approach for revealing candidate cis-regulatory elements genome-wide at cell-type resolution. However, current single-cell methods suffer from limited throughput and high costs. Here, we present a novel technique called scifi-ATAC-seq, single-cell combinatorial fluidic indexing ATAC-sequencing, which combines a barcoded Tn5 pre-indexing step with droplet-based single-cell ATAC-seq using the 10X Genomics platform. With scifi-ATAC-seq, up to 200,000 nuclei across multiple samples can be indexed in a single emulsion reaction, representing an approximately 20-fold increase in throughput compared to the standard 10X Genomics workflow.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Cromatina , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Núcleo Celular
7.
Sci Rep ; 14(1): 9000, 2024 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-38637641

RESUMO

Long-read genome sequencing (lrGS) is a promising method in genetic diagnostics. Here we investigate the potential of lrGS to detect a disease-associated chromosomal translocation between 17p13 and the 19 centromere. We constructed two sets of phased and non-phased de novo assemblies; (i) based on lrGS only and (ii) hybrid assemblies combining lrGS with optical mapping using lrGS reads with a median coverage of 34X. Variant calling detected both structural variants (SVs) and small variants and the accuracy of the small variant calling was compared with those called with short-read genome sequencing (srGS). The de novo and hybrid assemblies had high quality and contiguity with N50 of 62.85 Mb, enabling a near telomere to telomere assembly with less than a 100 contigs per haplotype. Notably, we successfully identified the centromeric breakpoint of the translocation. A concordance of 92% was observed when comparing small variant calling between srGS and lrGS. In summary, our findings underscore the remarkable potential of lrGS as a comprehensive and accurate solution for the analysis of SVs and small variants. Thus, lrGS could replace a large battery of genetic tests that were used for the diagnosis of a single symptomatic translocation carrier, highlighting the potential of lrGS in the realm of digital karyotyping.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Translocação Genética , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequência de Bases , Centrômero/genética
8.
J Med Virol ; 96(4): e29618, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38639293

RESUMO

Human adenovirus (HAdV) is a significant viral pathogen causing severe acute respiratory infections (SARIs) in children. To improve the understanding of type distribution and viral genetic characterization of HAdV in severe cases, this study enrolled 3404 pediatric SARI cases from eight provinces of China spanning 2017-2021, resulting in the acquisition of 112 HAdV strains. HAdV-type identification, based on three target genes (penton base, hexon, and fiber), confirmed the diversity of HAdV types in SARI cases. Twelve types were identified, including species B (HAdV-3, 7, 55), species C (HAdV-1, 2, 6, 89, 108, P89H5F5, Px1/Ps3H1F1, Px1/Ps3H5F5), and E (HAdV-4). Among these, HAdV-3 exhibited the highest detection rate (44.6%), followed by HAdV-7 (19.6%), HAdV-1 (12.5%), and HAdV-108 (9.8%). All HAdV-3, 7, 55, 4 in this study belonged to dominant lineages circulating worldwide, and the sequences of the three genes demonstrated significant conservation and stability. Concerning HAdV-C, excluding the novel type Px1/Ps3H1F1 found in this study, the other seven types were detected both in China and abroad, with HAdV-1 and HAdV-108 considered the two main types of HAdV-C prevalent in China. Two recombinant strains, including P89H5F5 and Px1/Ps3H1F1, could cause SARI as a single pathogen, warranting close monitoring and investigation for potential public health implications. In conclusion, 5 years of SARI surveillance in China provided crucial insights into HAdV-associated respiratory infections among hospitalized pediatric patients.


Assuntos
Infecções por Adenovirus Humanos , Adenovírus Humanos , Infecções Respiratórias , Criança , Humanos , Adenovírus Humanos/genética , Análise de Sequência de DNA/métodos , Filogenia , Adenoviridae/genética , China/epidemiologia , Infecções Respiratórias/epidemiologia
9.
Genome Biol ; 25(1): 101, 2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38641647

RESUMO

Many bioinformatics methods seek to reduce reference bias, but no methods exist to comprehensively measure it. Biastools analyzes and categorizes instances of reference bias. It works in various scenarios: when the donor's variants are known and reads are simulated; when donor variants are known and reads are real; and when variants are unknown and reads are real. Using biastools, we observe that more inclusive graph genomes result in fewer biased sites. We find that end-to-end alignment reduces bias at indels relative to local aligners. Finally, we use biastools to characterize how T2T references improve large-scale bias.


Assuntos
Genoma , Genômica , Genômica/métodos , Biologia Computacional , Mutação INDEL , Viés , Análise de Sequência de DNA/métodos , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos
10.
Bioinformatics ; 40(4)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38569896

RESUMO

MOTIVATION: Long-read sequencing technologies, an attractive solution for many applications, often suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. sequencing mutagenized libraries where multiple distinct clones differ by one or few variants, require the use of barcodes or unique molecular identifiers. Unfortunately, sequencing errors can interfere with correct barcode identification, and a given barcode sequence may be linked to multiple independent clones within a given library. RESULTS: Here we focus on the target application of sequencing mutagenized libraries in the context of multiplexed assays of variant effects (MAVEs). MAVEs are increasingly used to create comprehensive genotype-phenotype maps that can aid clinical variant interpretation. Many MAVE methods use long-read sequencing of barcoded mutant libraries for accurate association of barcode with genotype. Existing long-read sequencing pipelines do not account for inaccurate sequencing or nonunique barcodes. Here, we describe Pacybara, which handles these issues by clustering long reads based on the similarities of (error-prone) barcodes while also detecting barcodes that have been associated with multiple genotypes. Pacybara also detects recombinant (chimeric) clones and reduces false positive indel calls. In three example applications, we show that Pacybara identifies and correctly resolves these issues. AVAILABILITY AND IMPLEMENTATION: Pacybara, freely available at https://github.com/rothlab/pacybara, is implemented using R, Python, and bash for Linux. It runs on GNU/Linux HPC clusters via Slurm, PBS, or GridEngine schedulers. A single-machine simplex version is also available.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biblioteca Gênica , Genótipo , Análise por Conglomerados
11.
BMC Genomics ; 25(1): 365, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38622536

RESUMO

BACKGROUND: Microbial genomes are largely comprised of protein coding sequences, yet some genomes contain many pseudogenes caused by frameshifts or internal stop codons. These pseudogenes are believed to result from gene degradation during evolution but could also be technical artifacts of genome sequencing or assembly. RESULTS: Using a combination of observational and experimental data, we show that many putative pseudogenes are attributable to errors that are incorporated into genomes during assembly. Within 126,564 publicly available genomes, we observed that nearly identical genomes often substantially differed in pseudogene counts. Causal inference implicated assembler, sequencing platform, and coverage as likely causative factors. Reassembly of genomes from raw reads confirmed that each variable affects the number of putative pseudogenes in an assembly. Furthermore, simulated sequencing reads corroborated our observations that the quality and quantity of raw data can significantly impact the number of pseudogenes in an assembler dependent fashion. The number of unexpected pseudogenes due to internal stops was highly correlated (R2 = 0.96) with average nucleotide identity to the ground truth genome, implying relative pseudogene counts can be used as a proxy for overall assembly correctness. Applying our method to assemblies in RefSeq resulted in rejection of 3.6% of assemblies due to significantly elevated pseudogene counts. Reassembly from real reads obtained from high coverage genomes showed considerable variability in spurious pseudogenes beyond that observed with simulated reads, reinforcing the finding that high coverage is necessary to mitigate assembly errors. CONCLUSIONS: Collectively, these results demonstrate that many pseudogenes in microbial genome assemblies are actually genes. Our results suggest that high read coverage is required for correct assembly and indicate an inflated number of pseudogenes due to internal stops is indicative of poor overall assembly quality.


Assuntos
Genoma Bacteriano , Pseudogenes , Pseudogenes/genética , Mapeamento Cromossômico , Sequência de Bases , Genoma Microbiano , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
12.
Gigascience ; 132024 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-38626722

RESUMO

BACKGROUND: Most currently available reference genomes lack the sequence map of sex-limited (such as Y and W) chromosomes, which results in incomplete assemblies that hinder further research on sex chromosomes. Recent advancements in long-read sequencing and population sequencing have provided the opportunity to assemble sex-limited chromosomes without the traditional complicated experimental efforts. FINDINGS: We introduce the first computational method, Sorting long Reads of Y or other sex-limited chromosome (SRY), which achieves improved assembly results compared to flow sorting. Specifically, SRY outperforms in the heterochromatic region and demonstrates comparable performance in other regions. Furthermore, SRY enhances the capabilities of the hybrid assembly software, resulting in improved continuity and accuracy. CONCLUSIONS: Our method enables true complete genome assembly and facilitates downstream research of sex-limited chromosomes.


Assuntos
Genoma , Cromossomos Sexuais , Cromossomos Sexuais/genética , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
13.
BMC Bioinformatics ; 25(Suppl 1): 153, 2024 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-38627615

RESUMO

BACKGROUND: With the rapid increase in throughput of long-read sequencing technologies, recent studies have explored their potential for taxonomic classification by using alignment-based approaches to reduce the impact of higher sequencing error rates. While alignment-based methods are generally slower, k-mer-based taxonomic classifiers can overcome this limitation, potentially at the expense of lower sensitivity for strains and species that are not in the database. RESULTS: We present MetageNN, a memory-efficient long-read taxonomic classifier that is robust to sequencing errors and missing genomes. MetageNN is a neural network model that uses short k-mer profiles of sequences to reduce the impact of distribution shifts on error-prone long reads. Benchmarking MetageNN against other machine learning approaches for taxonomic classification (GeNet) showed substantial improvements with long-read data (20% improvement in F1 score). By utilizing nanopore sequencing data, MetageNN exhibits improved sensitivity in situations where the reference database is incomplete. It surpasses the alignment-based MetaMaps and MEGAN-LR, as well as the k-mer-based Kraken2 tools, with improvements of 100%, 36%, and 23% respectively at the read-level analysis. Notably, at the community level, MetageNN consistently demonstrated higher sensitivities than the previously mentioned tools. Furthermore, MetageNN requires < 1/4th of the database storage used by Kraken2, MEGAN-LR and MMseqs2 and is > 7× faster than MetaMaps and GeNet and > 2× faster than MEGAN-LR and MMseqs2. CONCLUSION: This proof of concept work demonstrates the utility of machine-learning-based methods for taxonomic classification using long reads. MetageNN can be used on sequences not classified by conventional methods and offers an alternative approach for memory-efficient classifiers that can be optimized further.


Assuntos
Metagenômica , Viverridae , Animais , Metagenômica/métodos , Redes Neurais de Computação , Metagenoma , Aprendizado de Máquina , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
14.
Yi Chuan ; 46(4): 306-318, 2024 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-38632093

RESUMO

With the increasing number of complex forensic cases in recent years, it's more important to combine the different types of genetic markers such as short tandem repeats (STRs), single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (InDels), and microhaplotypes (MHs) to provide more genetic information. In this study, we selected totally 201 genetic markers, including 24 autosomes STRs (A-STRs), 24 Y chromosome STRs (Y-STRs), 110 A-SNPs, 24 Y-SNPs, 9 A-InDels, 1 Y-InDel, 8 MHs, and Amelogenin to establish the HID_AM Panel v1.0, a Next-Generation Sequencing (NGS) detection system. According to the validation guidelines of the Scientific Working Group on DNA Analysis Methods (SWGDAM), the repeatability, accuracy, sensitivity, suitability for degraded samples, species specificity, and inhibitor resistance of this system were assessed. The typing results on 48 STRs and Amelogenin of this system were completely consistent with those obtained using capillary electrophoresis. This system accurately detected 79 SNPs as parallelly confirmed by a FGx sequencer with the ForenSeq™ DNA Signature Prep Kit. Complete allele typing results could be obtained with a DNA input of no less than 200 pg. The detection success rate of this system was significantly higher than that of the GlobalFiler™ kit when the degradation index of mock degraded sample was greater than 15.87. When the concentration of hematin in the amplification system was ≤40 µmol/L, indigo blue was ≤2 mmol/L, or humic acid was ≤15 ng/µL, amplification was not significantly inhibited. The system barely amplified the DNA extract from duck, mouse, cow, rabbit, and chick. The detection rate of STRs on routine samples of this panel is 99.74%, while all the SNPs, InDels, and MHs were successfully detected. In summary, we setup a NGS individual typing panel including 201 genetic markers with the high accuracy, sensitivity, species specificity, and inhibitors resistance, which is applicable for individual identification of degraded samples.


Assuntos
Impressões Digitais de DNA , Polimorfismo de Nucleotídeo Único , Feminino , Bovinos , Animais , Camundongos , Coelhos , Impressões Digitais de DNA/métodos , Marcadores Genéticos , Amelogenina/genética , Genótipo , Reação em Cadeia da Polimerase , Reprodutibilidade dos Testes , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Repetições de Microssatélites , DNA , Análise de Sequência de DNA/métodos
15.
Sci Rep ; 14(1): 7731, 2024 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-38565928

RESUMO

Data storage in DNA has recently emerged as a promising archival solution, offering space-efficient and long-lasting digital storage solutions. Recent studies suggest leveraging the inherent redundancy of synthesis and sequencing technologies by using composite DNA alphabets. A major challenge of this approach involves the noisy inference process, obstructing large composite alphabets. This paper introduces a novel approach for DNA-based data storage, offering, in some implementations, a 6.5-fold increase in logical density over standard DNA-based storage systems, with near-zero reconstruction error. Combinatorial DNA encoding uses a set of clearly distinguishable DNA shortmers to construct large combinatorial alphabets, where each letter consists of a subset of shortmers. We formally define various combinatorial encoding schemes and investigate their theoretical properties. These include information density and reconstruction probabilities, as well as required synthesis and sequencing multiplicities. We then propose an end-to-end design for a combinatorial DNA-based data storage system, including encoding schemes, two-dimensional (2D) error correction codes, and reconstruction algorithms, under different error regimes. We performed simulations and show, for example, that the use of 2D Reed-Solomon error correction has significantly improved reconstruction rates. We validated our approach by constructing two combinatorial sequences using Gibson assembly, imitating a 4-cycle combinatorial synthesis process. We confirmed the successful reconstruction, and established the robustness of our approach for different error types. Subsampling experiments supported the important role of sampling rate and its effect on the overall performance. Our work demonstrates the potential of combinatorial shortmer encoding for DNA-based data storage and describes some theoretical research questions and technical challenges. Combining combinatorial principles with error-correcting strategies, and investing in the development of DNA synthesis technologies that efficiently support combinatorial synthesis, can pave the way to efficient, error-resilient DNA-based storage solutions.


Assuntos
Replicação do DNA , DNA , Análise de Sequência de DNA/métodos , DNA/genética , Algoritmos , Armazenamento e Recuperação da Informação
16.
Sci Rep ; 14(1): 7892, 2024 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-38570611

RESUMO

Haplotype-resolved genome assembly plays a crucial role in understanding allele-specific functions. However, obtaining haplotype-resolved assembly for auto-polyploid genomes remains challenging. Existing methods can be classified into reference-based phasing, assembly-based phasing, and gamete binning. Nevertheless, there is a lack of cost-effective and efficient methods for haplotyping auto-polyploid genomes. In this study, we propose a novel phasing algorithm called PolyGH, which combines Hi-C and gametic data. We conducted experiments on tetraploid potato cultivars and divided the method into three steps. Firstly, gametic data was utilized to bin non-collapsed contigs, followed by merging adjacent fragments of the same type within the same contig. Secondly, accurate Hi-C signals related to differential genomic regions were acquired using unique k-mers. Finally, collapsed fragments were assigned to haplotigs based on combined Hi-C and gametic signals. Comparing PolyGH with Hi-C-based and gametic data-based methods, we found that PolyGH exhibited superior performance in haplotyping auto-polyploid genomes when integrating both data types. This approach has the potential to enhance haplotype-resolved assembly for auto-polyploid genomes.


Assuntos
Células Germinativas , Poliploidia , Humanos , Análise de Sequência de DNA/métodos , Haplótipos/genética , Alelos
17.
PLoS One ; 19(4): e0301446, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38573983

RESUMO

Reductions in sequencing costs have enabled widespread use of shotgun metagenomics and amplicon sequencing, which have drastically improved our understanding of the microbial world. However, large sequencing projects are now hampered by the cost of library preparation and low sample throughput, comparatively to the actual sequencing costs. Here, we benchmarked three high-throughput DNA extraction methods: ZymoBIOMICS™ 96 MagBead DNA Kit, MP BiomedicalsTM FastDNATM-96 Soil Microbe DNA Kit, and DNeasy® 96 PowerSoil® Pro QIAcube® HT Kit. The DNA extractions were evaluated based on length, quality, quantity, and the observed microbial community across five diverse soil types. DNA extraction of all soil types was successful for all kits, however DNeasy® 96 PowerSoil® Pro QIAcube® HT Kit excelled across all performance parameters. We further used the nanoliter dispensing system I.DOT One to miniaturize Illumina amplicon and metagenomic library preparation volumes by a factor of 5 and 10, respectively, with no significant impact on the observed microbial communities. With these protocols, DNA extraction, metagenomic, or amplicon library preparation for one 96-well plate are approx. 3, 5, and 6 hours, respectively. Furthermore, the miniaturization of amplicon and metagenome library preparation reduces the chemical and plastic costs from 5.0 to 3.6 and 59 to 7.3 USD pr. sample. This enhanced efficiency and cost-effectiveness will enable researchers to undertake studies with greater sample sizes and diversity, thereby providing a richer, more detailed view of microbial communities and their dynamics.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Metagenoma , Análise Custo-Benefício , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , DNA , Solo , Metagenômica/métodos
18.
HLA ; 103(4): e15473, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38575364
19.
Nat Commun ; 15(1): 2964, 2024 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-38580638

RESUMO

The high sequencing error rate has impeded the application of long noisy reads for diploid genome assembly. Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, we present PECAT, a Phased Error Correction and Assembly Tool, for reconstructing diploid genomes from long noisy reads. We design a haplotype-aware error correction method that can retain heterozygote alleles while correcting sequencing errors. We combine a corrected read SNP caller and a raw read SNP caller to further improve the identification of inconsistent overlaps in the string graph. We use a grouping method to assign reads to different haplotype groups. PECAT efficiently assembles diploid genomes using Nanopore R9, PacBio CLR or Nanopore R10 reads only. PECAT generates more contiguous haplotype-specific contigs compared to other assemblers. Especially, PECAT achieves nearly haplotype-resolved assembly on B. taurus (Bison×Simmental) using Nanopore R9 reads and phase block NG50 with 59.4/58.0 Mb for HG002 using Nanopore R10 reads.


Assuntos
Diploide , Nanoporos , Alelos , Haplótipos , Heterozigoto , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
20.
PeerJ ; 12: e17101, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38500526

RESUMO

Background: Structural variant (SV) calling from DNA sequencing data has been challenging due to several factors, including the ambiguity of short-read alignments, multiple complex SVs in the same genomic region, and the lack of "truth" datasets for benchmarking. Additionally, caller choice, parameter settings, and alignment method are known to affect SV calling. However, the impact of FASTQ read order on SV calling has not been explored for long-read data. Results: Here, we used PacBio DNA sequencing data from 15 Caenorhabditis elegans strains and four Arabidopsis thaliana ecotypes to evaluate the sensitivity of different SV callers on FASTQ read order. Comparisons of variant call format files generated from the original and permutated FASTQ files demonstrated that the order of input data affected the SVs predicted by each caller. In particular, pbsv was highly sensitive to the order of the input data, especially at the highest depths where over 70% of the SV calls generated from pairs of differently ordered FASTQ files were in disagreement. These demonstrate that read order sensitivity is a complex, multifactorial process, as the differences observed both within and between species varied considerably according to the specific combination of aligner, SV caller, and sequencing depth. In addition to the SV callers being sensitive to the input data order, the SAMtools alignment sorting algorithm was identified as a source of variability following read order randomization. Conclusion: The results of this study highlight the sensitivity of SV calling on the order of reads encoded in FASTQ files, which has not been recognized in long-read approaches. These findings have implications for the replication of SV studies and the development of consistent SV calling protocols. Our study suggests that researchers should pay attention to the input order sensitivity of read alignment sorting methods when analyzing long-read sequencing data for SV calling, as mitigating a source of variability could facilitate future replication work. These results also raise important questions surrounding the relationship between SV caller read order sensitivity and tool performance. Therefore, tool developers should also consider input order sensitivity as a potential source of variability during the development and benchmarking of new and improved methods for SV calling.


Assuntos
Algoritmos , Genômica , Genômica/métodos , Análise de Sequência de DNA/métodos , Genoma , DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...